Quantifying Self-Organization with Optimal Predictors 
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Despite broad interest in self-organizing systems, there are few quantitative, experimentally- 
applicable criteria for self-organization. The existing criteria all give counter-intuitive results for 
important cases. In this Letter, we propose a new criterion, namely an internally- generated in- 
crease in the statistical complexity, the amount of information required for optimal prediction of the 
system's dynamics. We precisely define this complexity for spatially-extended dynamical systems, 
using the probabilistic ideas of mutual information and minimal sufficient statistics. This leads 
to a general method for predicting such systems, and a simple algorithm for estimating statistical 
complexity. The results of applying this algorithm to a class of models of excitable media (cyclic 
cellular automata) strongly support our proposal. 
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The term "self-organization" was coined in the 1940s 
to label processes in which systems become more 
highly organized over time, without being ordered by out- 
side agents or by external programs. It has become one 
of the leading concepts of nonlinear science, without ever 
having been properly defined. The prevailing "I know 
it when I see it" standard prevents the development of 
a theory of self-organization. Thus some say that "self- 
organizing" implies "dissipative" Q , and others that they 
can exhibit reversible self-organization 0,0, and no one 
knows if both groups are talking about the same idea. 

A definition of self-organization should be mathemati- 
cally precise, so we can build theories around it, and ex- 
perimentally applicable, so we can use empirical data to 
say whether something self-organizes. The goal of such 
a definition should be both to match our informal no- 
tions in easy cases, where intuition is clear and consen- 
sual, and to extend unambiguously to intuitively hard or 
disputed cases. If our informal notions allow for com- 
parative, "more than" judgments, a formalization should 
match those, too. Generally there are many ways to for- 
malize a single concept, and competing formalizations 
must be judged by their scientific fruitfulness; differing 
formalizations may be appropriate in different contexts. 
(For more on such methodological issues, see 0.) 

We believe we have a formal criterion for self- 
organization that meets the key requirements. It is pre- 
cise, unambiguous, and operational. We check its confor- 
mity with intuition against cellular automata, specifically 
cyclic cellular automata (CA). They are ideal test cases: 
their dynamics are completely known (because we specify 
them) and can easily be simulated exactly. They are rea- 
sonable qualitative models of excitable media, and there 
is an analytical theory 6] of the patterns they form. We 
show that our definition works, at least in this case. Two 
of us discussed preliminary work in [jj; here we present 



the (concurring) results of larger, more extensive simula- 
tions. 1 

Measuring Organization Few attempts have been 
made to measure self-organization quantitatively. Ther- 
modynamic entropy is an obvious measure of organiza- 
tion for physicists, and several works claim to measure 
self-organization by finding spontaneous declines in en- 
tropy jy, 0, But thermodynamic entropy is a bad 
measure of organization in complex systems Ilil . 
Entropy is proportional to the logarithm of the accessible 
volume in phase space, which has no necessary connec- 
tion to any kind of organization. Thus low-temperature 
states of Ising systems or Fermi fluids have very low en- 
tropy, but no discernible organization Biological 
organisms are never in pure, low-entropy states, but are 
organized, if anything is. Some kinds of biological self- 
organization are, in fact, thermodynamically driven by 
increasing entropy |l2|, [l5| . 

After "fall in entropy", the leading idea on how to 
measure self-organization, advanced in |l6|, is a rise in 
complexity. While there are many proposed measures 
of physical complexity, the general view is that complex 
phenomena are ones which cannot be described concisely 
and accurately (see 14] for a general survey). Most 
proposals use algorithmic descriptions, and are limited 
by inherent uncomputability. Here we take a stochastic 
point of view, aiming to statistically describe ensembles 
of configurations. We follow Grassberger ^1 m defin- 

1 Strictly speaking, we quantify system organization. In iso- 
lated systems, as in our simulations, this is necessarily self- 
organization. Distinguishing self- from external organization in 
systems receiving structured input is tricky; we discuss some pos- 
sible approaches below. 

In any case, our subject is distinct from "self-organized criti- 
cality" 8], a term labeling non-equilibrium systems whose at- 
tractors show power-law fluctuations and long-range correlations. 
We plan to address whether such systems are self-organizing in 
our sense in future work. 
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ing the complexity of a process as the least amount of 
information about its state needed for maximally accu- 
rate prediction. Crutchfield and Young ^| extended this 
concept, by giving operational definitions of "maximally 
accurate prediction" and "state" . 

The Grassberger- Crutchfield- Young "statistical com- 
plexity" , C, is the information content of the minimal suf- 
ficient statistic for predicting the process's future In 
thermodynamic settings, this is the amount of informa- 
tion a full set of macrovariables contains about the sys- 
tem's microscopic state |2£|. We now sketch the formal- 
ism allowing us to use statistical complexity to charac- 
terize spatially-extended dynamical systems of arbitrary 
dimension, after [2l[. 

Let x(r, t) be an n + ID field, possibly stochastic, in 
which interactions between different space-time points 
propagate at speed c. As in 22], define the past light 
cone of the space-time point (r, t) as all points which 
could influence x(r,i), i.e., all points (q,u) where u < t 
and \\q — r\\ < c(t — u) . The future light cone of (r, t) is 
the set of all points which could be influenced by what 
happens at (r, t). Z~(r, t) is the configuration of the field 
in the past light cone, and Z + (r, t) the same for the future 
light cone. The distribution of future light cone configu- 
rations, given the configuration in the past, is P(Z + |Z~). 

Any function 77 of l~ defines a local statistic. It sum- 
marizes the influence of all the space-time points which 
could affect what happens at (r , t) . Such local statis- 
tics should tell us something about "what comes next," 
which is Z + . ( 21] explains why we must use local predic- 
tors, and the advantages of basing them on light cones, 
as first suggested by 22].) Information theory lets us 
quantify how informative different statistics are. 

The information about variable x in variable y is 

where P(x, y) is joint probability, P(x) is marginal prob- 
ability, and (•) is expectation |23|. The information a 
statistic 77 conveys about the future is I[l+;r](l~)]. A 
statistic is sufficient if it is as informative as possible 
0, here if and only if 7[Z+; 7/(Z-)] = This is 

the same [H as requiring that P(l+\r](l~)) = P(Z + |Z~). 
A sufficient statistic retains all the predictive informa- 
tion in the data. Decision theory [24| tells us that maxi- 
mally accurate and precise prediction needs only a suffi- 
cient statistic, not the original data; in fact, any predictor 
which does not use a sufficient statistic can be replaced 
by a superior one which does. Since we want optimal 
prediction, we confine ourselves to sufficient statistics. 

If we use a sufficient statistic 77 for prediction, we must 
describe or encode it. Since rj(l~) is a function of l~, 
this encoding takes I[r](l~);l~] bits. If knowing rji lets 
us compute 772, which is also sufficient, then 772 is a more 
concise summary, and I[r]i(l~); l~] > /^(Z - ); Z - ]. A 



minimal sufficient statistic |23| can be computed from 
any other sufficient statistic. We now construct one. 

Take two past light cone configurations, l± and Z^~. 
Each has some conditional distribution over future light 
cone configurations, P(Z + |Z^) and P^+l^) respectively. 
The two past configurations are equivalent, Zf ~ Z^~, if 
those conditional distributions are equal. The set of con- 
figurations equivalent to l~ is [l~). Our statistic is the 
function which maps past configurations to their equiva- 
lence classes: 

e{r) = [r] = {\ : P(Z+|A)=P(Z+|Z")} (2) 

Clearly, P(Z+|e(Z")) = P(Z + |Z-), and so 7[Z+;c(Z-)] = 
7[Z + ;Z~], making e a sufficient statistic. The equiva- 
lence classes, the values e can take, are the causal states 
[lM QjJ |2fl, [2l[ . Each causal state is a set of specific past 
light-cones, and all the cones it contains are equivalent, 
predicting the same possible futures with the same prob- 
abilities. Thus there is no advantage to subdividing the 
causal states, which are the coarsest set of predictively 
sufficient states. 

For any sufficient statistic 77, P(Z + |Z~) = P(Z + |t7(Z - )). 
So if r/(Z 1 -) = 77^), then P(Z+|Zf) = P(Z + |Z^), and the 
two pasts belong to the same causal state. Since we can 
get the causal state from rj(l~), we can use the latter to 
compute e(Z~). Thus, e is minimal. Moreover, e is the 
unique minimal sufficient statistic [2]]: any other just 
relabels the same states. 

Because e is minimal, 7[e(Z _ );Z _ ] < I[r](l~)] Z~], for 
any other sufficient statistic 77. Thus we can speak objec- 
tively about the minimal amount of information needed 
to predict the system, which is how much information 
about the past of the system is relevant to predicting its 
own dynamics. This quantity, 7[e(Z _ ); Z~], is a character- 
istic of the system, and not of any particular model. We 
define the statistical complexity as 

C = I[e(l-);l-} (3) 

C is the amount of information required to describe the 
behavior at that point, and equals the log of the effective 
number of causal states, i.e., of different distributions for 
the future. Complexity lies between disorder and order 
[3113, El, and C = both when the field is completely 
disordered (all values of x are independent) and com- 
pletely ordered (x is constant). C grows when the field's 
dynamics become more flexible and intricate, and more 
information is needed to describe the behavior. 

We now sketch an algorithm to recover the causal 
states from data, and so estimate C. ( 21] provides de- 
tails, including pseudocode; cf. ^ each time t, list 
the observed past and future light-cone configurations, 
and put the observed past configurations in some arbi- 
trary order, {Z~}. (In practice, we must limit how far 
light-cones extend into the past or future.) For each past 
configuration Z~, estimate P^(Z + |Z Z ~). We want to esti- 
mate the states, which ideally are groups of past cones 



with the same conditional distribution over future cone 
configurations. Not knowing the conditional distribu- 
tions a priori, we must estimate them from data, and 
with finitely many samples, such estimates always have 
some error. Thus, we approximate the true causal states 
by clusters of past light-cones with similar distributions 
over future light-cones; the conditional distribution for a 
cluster is the weighted mean of those of its constituent 
past cones. Start by assigning the first past, l± to the 
first cluster. Thereafter, for each /~, go down the list of 
existing clusters and check whether P t (l~^\l~) differs sig- 
nificantly from each cluster's distribution, as determined 
by a fixed-size x 2 test. (We used a = 0.05 in our simu- 
lations below.) If the discrepancy is insignificant, add 
to the first matching cluster, updating the latter's dis- 
tribution. Make a new cluster if l~ does not match any 
existing cluster. Continue until every /~ is assigned to 
some cluster. The clusters are then the estimated causal 
states at time t. Finally, obtain the probabilities of the 
different causal states from the empirical probabilities of 
their constituent past configurations, and calculate C(t). 
This procedure converges on the correct causal states as 
it gets more data, independent of the order of presenta- 
tion of the past light-cones, the ordering of the clusters, 
or the size a of the significance test [2l|. For finite data, 
the order of presentation matters, but we finesse this by 
randomizing the order. 

We say a system has organized between times t\ and 
t 2 if (I) C(t 2 ) - C(ti) = AC > 0. It has self- organized 
if (II) some of the rise in complexity is not due to ex- 
ternal agents. We can check condition (I) by estimating 
AC. We know condition (II) holds for many systems, 
because they either have no external inputs (e.g., deter- 
ministic CA), or only unstructured inputs (e.g., chemical 
pattern- formers exposed to thermal noise). For systems 
with structured input, we need, but lack, a way to say 
how much of AC is due to that input. We could, per- 
haps, treat this as a causal inference problem 25], with 
AC as the response variable, and the input as the treat- 
ment. Alternately, we could see how much AC chan ges i f 
we replace the input with statistically-similar noise |2q . 

Numerical Experiments and Results Having de- 
veloped a quantitative criterion for self-organization, we 
now check it experimentally. Our test systems are cyclic 
cellular automata 6] (CCA), which are models of pattern 
formation in excitable media [2tJ . Each site in a square 
lattice has one of k colors. A cell of color k will change 
its color to k + 1 mod n if there are already at least T 
("threshold") cells of that color in its neighborhood, i.e., 
within a distance r ("range") of that cell. Otherwise, 
the cell keeps its current color. (In normal excitable me- 
dia, which have a unique quiescent state, the role of the 
threshold is slightly different |27|.) All cells update their 
colors in parallel. 

CCA have three generic long-run behaviors, depend- 
ing on the ratio of the threshold to the range. At high 
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FIG. 1: (Color online.) Phases of the cyclic CA. Parameters 
are as described in the text, started from uniform random 
initial conditions. Color figures were prepared with |28|. From 
the top left: (a) Local oscillations (T = 1), in which the CA 
oscillates with period 4, each cell cycling through all colors; 
(b) Spiral waves (T = 2) ; (c) The "turbulent" phase (T = 3) ; 
(d) Fixation with solid color blocks (T = 4) . 

thresholds, CCA form homogeneous blocks of solid col- 
ors, which are completely static ("fixation"). At very 
low thresholds, the entire lattice eventually oscillates pe- 
riodically; sometimes rotating spiral waves grow to engulf 
the entire lattice. With intermediate thresholds, incoher- 
ent traveling waves form, propagate, collide and disperse; 
this, metaphorically, is "turbulence". With a range one 
Moore (box) neighborhood and n = 4, the phenomenol- 
ogy is as follows H (see Fig. [Q. T = 1 and T = 2 are 
both locally periodic, but T = 2 produces spiral waves, 
while T = 1 quenches incoherent local oscillations. T = 3 
leads to meta-stable turbulence — spiral waves can form 
and entrain the entire CA, but turbulence can persist in- 
definitely on finite lattices. Fixation occurs with T > 4. 
All CCA phases self-organize when started from uniform 
noise. (This is best appreciated by viewing simulations 
By the same intuitive standard, the fixation phase 
is less organized than turbulence (which has dynamic, 
large-scale spatial structures), which in turn is less orga- 
nized than spiral waves (which has more intricate struc- 
tures). It is hard to say, by eye, whether incoherent lo- 
cal oscillations are more or less organized than simple 
fixation. All four regimes lead to stable stationary dis- 
tributions. Thus, C should start at zero (reflecting the 
totally random initial conditions), rise to a steady value, 
and stay there. T = 2 should have the highest long-run 
complexity, followed by T = 3. 

We ran k = 4, r = 1 CCA on 300 x 300 lattices with 
periodic boundary conditions, for T from 1 to 4. Figure 
El shows the results of applying our proposed measure 
of self-organization to these simulations. We used light- 
cones extending 1 time-step into both past and future; 
longer light-cones did not, here, lead to different states. 
The agreement with expectations is clear. All four curves 
climb steadily to plateaus, leveling off when the distribu- 
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Complexity versus Time 




FIG. 2: (Color online.) Complexity over time for CCA with 
different thresholds T, averaging 30 independent simulations 
at each value of T. The T — 2 curve has the highest asymp- 
tote, followed by T = 3, T = 4 and T = 1. Error bars: 
standard error of the complexity. 

tion of CA configurations become stationary. Sampling 
noise leads to fluctuations around the asymptotic values 
I3- The slight fall in complexity for T = 3 occurs when 
spirals try to form but break up, and their debris limit 
further spiral formation. Additional simulations at differ- 
ent lattice sizes L show the estimated long-run complex- 
ity growing with L, approaching a limit as 0(L _1 ). This 
rate combines finite-size effects with the negative bias of 
our information estimator, which is at least 0(L~ 2 ) [2^ . 
We hope in the future to precisely determine both our es- 
timation bias and the finite-size scaling of the complexity. 

Conclusion A theory of self-organization should pre- 
dict when and why different systems will assume different 
kinds and degrees of organization. This will require an 
adequate characterization of self-organization. We ar- 
gue that "internally-caused rise in complexity" works, 
if we define complexity as the amount of information 
needed for optimal statistical prediction. We can reli- 
ably estimate this statistical complexity from data, and 
for CCA, the estimates match intuitive judgments about 
self-organization. The methods used are not limited to 
CA, but apply to all kinds of discrete random fields, 
including ones on complex networks [2]]. They would 
work equally well on discretized empirical data, e.g., dig- 
ital movies of chemical pattern formation experiments. 
This is a first step towards a physical theory of self- 
organization. 
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